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di«cu«sed from a methodological viewpoint. However, no attempt is made to develop 
methods of evaluating programs. In Part I. the structure of an educational program 
i= viewed as a system with three components— inputs, transformation of inputs into 
outputs, and outputs. Part II discusses the necessary condition for a program to be 
a =v=tem (the presence of feedback loops) citing as one example the school syste 
with an evaluation unit. In Part III. the possibility of mapping experimental designs 
into social space characterized by feedback loops is confirmed while refuting 
statements by Stufflebeam to the contrary. Part W examines the historica 
precedence for the findings and concludes that it is possible, from a methodological 
viewpoint, to implement a rigorous experimental design and also to provide feedback 
for managerial decisionmaking in the context of action research. (HW) 
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Introduction 



The following essay discusses the logic of the evaluation of educa- 
tional and other action programs. As such, it is a metho dological 
statement, introducing methods only incidentally. If one seeks to 
discover methods of evaluating programs here, they will be disappointed. 
For that the reader must look elsewhere. Only formal problems of the 
possibility of evaluation are treated. In fact, this essay has so little 
regard for methods that we have not at all concerned ourselves with 
constructive arguments (in the sense that our proofs will not lead to 
algorithms). 

This document need not be justified further: anyone who knows the 
literature on evaluation is aware of the terrible conceptual muddle of 
educational project assessment. Those who don*t know the literature 
should have stopped reading a paragraph ago. The essay consists of 
four parts. Part I gives the structure of an educational program. 

Part II discusses the necessary condition for the program to be a 
system, the presence of feedback loops. Part IH discusses the possibility 
of mapping experimental designs into social space characterized by 
feedback loops. Part IV examines the historical precedence for the 
findings reported. 
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X. 



Theorists in the social sciences and education are thinking more 
and more of their subject matter in terms of "systems. " to which, as 
to any system, process and adaptive control are essential. The 
following is an examination of such a system and a discussion of seve a 

of its relevant properties. 

Three things are essential to any process: first, inputs or raw 

materials, second, a transformation to convert the raw material, and 
third, the output or finished product. In an educational system, for 
example, the student might be the input, the system the combination of 
curriculum, teachers, and physical plant, etc., and the output the high 

school graduate. 

A means of guaranteeing that the system will produce the desired 
output must also be provided. If society wants more electrical engineers, 
for example, it does not want electronic technicians. Probably either job 
could be performed by the same individual. The job he does perform 
depends, in large part, upon the standards of the educational system of 
which he is the product. One set of standards or criteria, by specifying 
the level of abilities and competencies required for the job, defines an 
electrical engineer, while a second set of criteria defines the technician. 
When the individual can meet one of these sets of criteria, he can fill 

the specified job. 

Criteria are established, then, in order to define and permit control 
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of the system. When it is noticed that the criteria are not being met, 
appropriate changes are made in the program to reestablish accord 
with the criteria and eliminate the discrepancies. This requires a feed- 
back or control loop as a formal property of the system. It should be 
emphasized that there are two ways the violation of standards can be 
characterized: either the individual product is "defective," or the 

system is inadequate to its task. 

Within the system are a number of programs designed to achieve 
the specific goals of the system. It is possible to schematize a program 
completely by a consideration of the characteristic kinds of behavior 
involved in that program. These kinds, or dimensions, of behavior fall 
into three classes: "input variables," i. e., a set of dimensions of behavior 

which exists upon the subject's entry into the program, and which will be 
changed by the action of the program; "output variables," i.e. , a set of 
dimensions of behavior, identical to the input variables, existing at the 
point of exit from the program as the result of the program's action on 
the input variables, and "preconditions," a set of dimensions of behavior 
which is associated with the input and output variables but which will be 
unaffected by the program. In fact, the collection of dimensions of 
behavior indicated here defines a vector space. This accounts for the 
equal number of input and output dimensions (by the Principle of Dimensional 
Homogeneity). This becomes important at a later stage of our discussion, 
when we consider program change. 
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A compensatory program in the Pittsburgh Public Schools titled the 

$ 

"Transition Room Program" affords a good example. The purpose of the 
Transition Room is to help underachieving children solve their reading 
problems before they enter the fourth or fifth grade. Up to the fourth and 
fifth grades, learning to read is an end in itself. In these grades, however, 
it becomes a means to the acquisition of knowledge in other substantive 
areas: a "transition" is made from reading as subject matter to reading 

as a communication skill. The Transition Room Program is designed to 
facilitate this transition. In order to reach those children most in need 
of aid, selection criteria have been set up: children entering the program 
must have MAT reading achievement scores at least one year below grade 
level, indicating underachievement; must have an I. Q. of 85 or above, 
indicating a capacity for benefiting from remedial instruction; and must 
be enrolled in third or fourth grade. The goal of the program is to raise 
the MAT score to grade level. 

When considered in terms of dimensions of behavior, the program 
may be broadly described by the following diagram: 



* This example is discussed further in my "Evaluation of Public 
School Programs," a paper presented at L. R.D.C. in Pittsburgh 
18 November 19&8. 
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In the case of the Transition Room, there is only one input and 

associated output variable. In other programs there may be several, 

each to be acted upon by the program to produce an associated output 
variable. For each pair of change variables (that is, for each input-output 

pair) there is one process to transform the value of the input dimension 

to the value of the output dimension. The description of the process involved 

in the program may be made more specific if it is borne in mind that it 

/• 

is necessary to find a condition sufficient to effect the change from input 
to output for each pair. This becomes clear with consideration of 
criteria. 

Criteria come into existence when we specify thresholds or ranges 
of values for each dimension of behavior. Specifying values for the input 
and precondition variables provides a description of selection criteria. 
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Specifying values of the output variable provides a description of the 
goals of the program. To take the Transition Room example again, 




Further consideration of this example raises another point of 
interest. At the end of two school years, the student leaves the Transition 
Room because he no longer fulfills the precondition of third or fourth 
grade enrollment, whether his MAT score is at grade level or not. 

This points out the existence of an output variable that has no criterion 
associated with it; that is, there are states of the program, at termination, 
independent of goal achievement. Before goal discrepancies are evaluated 
a way must be found to characterize this terminal state, and as we can 
see, if the output state is described independently of goal state, but 
described in terms of the same dimensions, it becomes possible to 
characterize these discrepancies. 

n. 

The type of program that corresponds to the schema presented 
above is the "open- loop" control or implicit system. The next approximation 
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to the complexities of the educational system is the simple closed-loop 
control system, or "feedback" control system: for example, the school 
system with an evaluation unit. This is what can be called an "action 
program, " and is explicitly a system. 

Given a continuous evaluation activity, system control becomes 
essentially a statistical problem. In quality control of, say, ball-bearings, 
the steps would be, first, selection of variables (our input-output pair ) 
describing the mate rials, second, specification of parameters to provide 
criteria defining both the acceptable product and the acceptable functioning 
of the process, and then comparison of the product and process with the 
criteria. For example, if the production line transforms Babbitt metal 
and steel into ball-bearings, where diameter and weight are the descrip- 
tive variables, measures of central tendency and dispersion (the mean 
and standard deviation) will be specified in order to determine the tolerable 
amount of dispersion. The mean constitutes the expected value: if the 

weight of the ball-bearing is to be 3 grams, then ideally the weight of each 
and every ball-bearing will be 3 grams (of course measured on a perfect 
balance). Incidentally, preconditions might well be included: the specific 

gravity, tensile strength, etc. of the materials might be indicated to fall 
in a specified range. 

Of course, the quality control engineer never attempts to make the 
observed distribution, derived from the measurement of the variables 
selected as descriptive of the material, exactly coincide in the parameters, 
with the expected distribution. Any induced change in the observed 
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distribution, or output measure, that is, any action to make the product 
conform more closely with the criteria, must be considered a cost and 
subtracted from the cost incurred by the number of rejects in the process. 
Such action must be undertaken only to minimize overall cost (while 
maintaining output constant). 

The two key elements in quality control appear, then, to be the 
specification of the expected values of the variables selected as descrip- 
tive of the product, and the measurement of the actual material or sub- 
stance undergoing the process in order to ascertain whether the product 
is exceeding the tolerable deviation from the specified values. We have 
seen that the decision to take action as a result of these actual measure- 
ments rests on considerations of efficiency. 

Consequently, rather than the simple linear equation 

y = ax t b 

which describes the open- loop system, we have two independent variables, 
x and the parameter a, where a is a function of the goals. Thus we have 

y = *(g)x + b 

where f(g) is the statement of a as a function of goals g (See System "S" 
following). But how does one find the values defined by the criteria? 

A viable technique that we have used in Pittsburgh can be called the 
"value-finding" approach. As an example of this, a meeting was held of 
the staff members of a program.* At that time they were asked to rank 
order six possible objectives of the program. This data was analyzed on 
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the spot and the results were presented to the staff members later in the 
meeting. They were than asked to (1) comment on the points of consensus 
and (2) explain why they felt the points of dissensus existed. Consensus 
could be considered a manifestation of a value . 

The analysis proceeded as follows: The participants ranked the 
objectives from one to six. These "votes" were then arrayed in a 
6x6 table, with the objectives to be ranked comprising the rows, and 
rank-orders one to six comprising the columns. The frequency with 
which a given objective was given a specified rank was the cell entry. 
Variance could then be estimated. A semi-interquartile range of one 
rank or less was said to indicate consensus, and a range of more than 
one, dissensus. 
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The modal values have been underlined. This gives us a good notion 
of central tendency. A rough and ready notion of variance is provided by 
the semi-interquartile range (SIQ R: 25th to 75th centile). For the six 
items, it was as follows: 
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Items 


SI QR 


1 


1+ 


2 


2-3 


3 


1+ 


4 


2 


5 


1- 


6 


1- 



The smaller SIQR means the lesser variance. Maximal interrater 
agreement (i.e. a "value") would be indicated if each item has a SIQR 
of less than 1. Notice that both items 2 and 4 had large variances. In 
the case of 2, it was decided that the item (a) was ambiguous, and 
(b) was a supervenient, not terminal goal of the program. Item 4 was 
probably pretty vague too. A high variance is generally indicative of 
vagueness (lack of reliability). 

Of interest is the unanimity concerning items 5 and 6, which were 
dis values. There is an obvious difference between items 1-4, and 5-6. 

When the results were communicated to staff at the same meeting, 

it was suggested that the values expressed conformed to the structure 

# 

of the program. The program has been established with two somewhat 
incompatible major goals. The various objectives related to one of these 
major goals were values, rated high. The items related to the other 
major good were values, rated low. There had been included in the list 
of objectives two supervenient objectives, which might have provided 
the basis of resolution of the incompatabilities, between the major goals. 
Those items which related to the supervenient objectives, showed no 
consensus. They were found on inspection by the staff to be vague and 



amorphous. 
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